Lexical acquisition from corpora: the case of subcategorization frames in French
نویسندگان
چکیده
We present in this paper a method to automatically acquire a syntactic lexicon of subcategorization frames for French verbs directly from large corpora. The method is evaluated against existing lexical resources: we show that our system is capable of producing new frames that were not previously registered. Lastly, we show that it is possible to induce lexico-semantic classes « à la Levin » (1993) from these data. MOTS-CLÉS : lexique, verbe, sous-catégorisation, acquisition à partir de corpus.
منابع مشابه
A Subcategorization Frames Acquisition System for French Verbs
This paper presents a system intended to automatically acquire subcategorization frames (SCFs) of verbs from the analysis of large corpora. The system has been applied to a newspaper corpus (made of 10 years of the French newspaper Le Monde) and acquired subcategorization information for 3267 verbs. 286 SCFs were dynamically learnt for these verbs. From the analysis of 25 representative verbs, ...
متن کاملUnsupervised Acquisition of Verb Subcategorization Frames from Shallow-Parsed Corpora
In this paper, we reported experiments of unsupervised automatic acquisition of Italian and English verb subcategorization frames (SCFs) from general and domain corpora. The proposed technique operates on syntactically shallow-parsed corpora on the basis of a limited number of search heuristics not relying on any previous lexico-syntactic knowledge about SCFs. Although preliminary, reported res...
متن کاملA Subcategorization Acquisition System for French Verbs
This paper presents a system capable of automatically acquiring subcategorization frames (SCFs) for French verbs from the analysis of large corpora. We applied the system to a large newspaper corpus (consisting of 10 years of the French newspaper ’Le Monde’) and acquired subcategorization information for 3267 verbs. The system learned 286 SCF types for these verbs. From the analysis of 25 repre...
متن کاملCorpus-Based Induction of Lexical Representation and Meaning
The acquisition of linguistic knowledge, i.e., the identication, extraction, and encoding of linguistic information in a corpus, has been one of the main motivations for data-driven approaches to natural language. Methods have been developed for the acquisition of, for instance, parts of speech, noun compounds, collocations, support verbs, subcategorization frames, phrase structure rules, selec...
متن کاملCan Subcategorization Help a Statistical Dependency Parser?
Today there is a relatively large body of work on automatic acquisition of lexicosyntactical preferences (subcategorization) from corpora. Various techniques have been developed that not only produce machinereadable subcategorization dictionaries but also they are capable of weighing the various subcategorization frames probabilistically. Clearly there should be a potential to use such weighted...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- TAL
دوره 51 شماره
صفحات -
تاریخ انتشار 2010